Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken encoding when using document fragments #99

Merged
merged 4 commits into from
Mar 18, 2021

Conversation

schlessera
Copy link
Collaborator

@schlessera schlessera commented Mar 16, 2021

This ensures that the http-equiv charset meta tag needed for making the DOMDocument work properly with UTF-8 encoding is always added, even when all or parts of the HTML document structure are missing.

The current implementation does multiple regex calls, with the assumption that this optimizes for large documents where a <head> tag would normally always be present. So for any full real-world HTML document, only the first regex would ever be used. For actual document fragments, this assumes they would be small anyway, making multiple regex traversals cheap in these cases.

These multiple regex calls could be combined into one, but this would likely make the most common case much worse in terms of performance. Thoughts on that, @westonruter ?

Fixes #28

@schlessera schlessera added Bug Something isn't working DOM labels Mar 16, 2021
@schlessera schlessera added this to the 0.2.0 milestone Mar 16, 2021
@codecov
Copy link

codecov bot commented Mar 16, 2021

Codecov Report

Merging #99 (4758e21) into main (7d6402e) will increase coverage by 0.09%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##               main      #99      +/-   ##
============================================
+ Coverage     80.87%   80.96%   +0.09%     
- Complexity      928      931       +3     
============================================
  Files            48       48              
  Lines          2311     2322      +11     
============================================
+ Hits           1869     1880      +11     
  Misses          442      442              
Flag Coverage Δ Complexity Δ
php 80.96% <100.00%> (+0.09%) 0.00 <3.00> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
src/Dom/Document.php 83.51% <100.00%> (+0.29%) 222.00 <3.00> (+3.00)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7d6402e...4758e21. Read the comment docs.

@schlessera schlessera requested a review from westonruter March 16, 2021 13:52
src/Dom/Document.php Outdated Show resolved Hide resolved
@schlessera schlessera requested a review from westonruter March 17, 2021 11:01
@schlessera schlessera merged commit 9d70600 into main Mar 18, 2021
@schlessera schlessera deleted the fix/28-broken-encoding-on-fragments-without-head branch March 18, 2021 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working DOM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Encoding issue when loading HTML fragment
2 participants